Concept-Instance Relation Extraction from Simple Noun Sequences Using a Full-Text Search Engine

نویسندگان

  • Asuka Sumida
  • Kentaro Torisawa
  • Keiji Shinzato
چکیده

This paper describes a simple method for acquiring conceptinstance relations from simple noun sequences that frequently appear in Japanese Web documents. In Japanese, many noun sequences can consist of two NPs that have a concept-instance relation. This phenomenon is similar to apposition in English but differs in that many of these noun sequences do not provide any explicit clues, such as the proper noun capitalization or commas used in English apposition, that indicate the boundary between the concept name and the instance name. We developed a method to detect such implicit boundaries between concept names and instance names, and to filter out erroneous concept-instance relations by using a search engine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Taxonomy Extraction by Mining Topical Query Sessions

Search engine logs store detailed information on Web users interactions. Thus, as more and more people use search engines on a daily basis, important trails of users common knowledge are being recorded in those files. Previous research has shown that it is possible to extract concept taxonomies from full text documents, while other scholars have proposed methods to obtain similar queries from q...

متن کامل

Extracting Arabic Relations from the Web

There is a vast amount of unstructured Arabic information on the Web, this data is always organized in semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that extracts binary relations between two Arabic named entities from the Web. Several works have been performed for relation extraction from Latin texts and as far as we know, there isn’t any ...

متن کامل

A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine

Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...

متن کامل

Semantic Property Grammars for Knowledge Extraction from Biomedical Text

We present Semantic Property Grammars, designed to extract concepts and relations from biomedical texts. The implementation adapts a CHRG parser we designed for Property Grammars [1], which views linguistic constraints as properties between sets of categories and solves them by constraint satisfaction, can handle incomplete or erroneous text, and extract phrases of interest selectively. We endo...

متن کامل

Full-text Search in Intermediate Data Storage of FCART

The speed of full-text search directly affects the process of text analysis. Search engine creates a text index, which is used for fast full-text search. Solr and ElasticSearch are two popular search engines. A text analysis system requires fast implementing searching and indexing at the same time. This paper describes preprocessing workflow of the analysis system called Formal Concept Analysis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006